Adaptive Distance Metrics for Nearest Neighbour Classification Based on Genetic Programming
نویسندگان
چکیده
Nearest Neighbour (NN) classification is a widely-used, effective method for both binary and multi-class problems. It relies on the assumption that class conditional probabilities are locally constant. However, this assumption becomes invalid in high dimensions, and severe bias can be introduced, which degrades the performance of the method. The employment of a locally adaptive distance metric becomes crucial in order to keep class conditional probabilities approximately uniform, whereby better classification performance can be attained. This paper presents a locally adaptive distance metric for NN classification based on a supervised learning algorithm (Genetic Programming) that learns a vector of feature weights for the features composing an instance query. Using a weighted Euclidean distance metric, this has the effect of adaptive neighbourhood shapes to query locations, stretching the neighbourhood along the directions for which the class conditional probabilities don’t change much. Initial empirical results on a set of real-world classification datasets showed that the proposed method enhances the generalisation performance of standard NN algorithm, and that it is a competent method for pattern classification as compared to other learning algorithms.
منابع مشابه
An efficient weighted nearest neighbour classifier using vertical data representation
The k-nearest neighbour (KNN) technique is a simple yet effective method for classification. In this paper, we propose an efficient weighted nearest neighbour classification algorithm, called PINE, using vertical data representation. A metric called HOBBit is used as the distance metric. The PINE algorithm applies a Gaussian podium function to set weights to different neighbours. We compare PIN...
متن کاملNearest Neighbour Distance Matrix Classification
A distance based classification is one of the popular methods for classifying instances using a point-to-point distance based on the nearest neighbour or k-NEAREST NEIGHBOUR (k-NN). The representation of distance measure can be one of the various measures available (e.g. Euclidean distance, Manhattan distance, Mahalanobis distance or other specific distance measures). In this paper, we propose ...
متن کاملEecient Nearest-neighbour Searches Using Weighted Euclidean Metrics
Building an index tree is a common approach to speed up the k nearest neighbour search in large databases of many-dimensional records. Many applications require varying distance metrics by putting a weight on diierent dimensions. The main problem with k nearest neighbour searches using weighted euclidean metrics in a high dimensional space is whether the searches can be done eeciently We presen...
متن کاملHesitant Fuzzy k-Nearest Neighbour (HFK-NN) Classifier for Document Classification and Numerical Result Analysis
This paper presents new approach Hesitant Fuzzy K-nearest neighbour (HFK-nn) based document classification and numerical results analysis. The proposed classification Hesitant Fuzzy K-nearest neighbour (HFKnn) approach is based on hesitant Fuzzy distance. In this paper we have used hesitant Fuzzy distance calculations for document classification results. The following steps are used for classif...
متن کاملImproving k-Nearest Neighbour Classification with Distance Functions Based on Receiver Operating Characteristics
The k-nearest neighbour (k-NN) technique, due to its interpretable nature, is a simple and very intuitively appealing method to address classification problems. However, choosing an appropriate distance function for k-NN can be challenging and an inferior choice can make the classifier highly vulnerable to noise in the data. In this paper, we propose a new method for determining a good distance...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013